Naïve Terminological Annotation of Legal Texts in Slovak

نویسندگان

چکیده

Correct automatic terminological annotation of texts in a corpus can be sometimes challenging task, especially for moderately or heavily inflected languages with relatively free word order. We explore the possibility simple based on sequence matching lemmatized to annotate Slovak language IATE entries. The accuracy annotating legal is very good when multiword terms, while single-word terms increased by applying filters lengths and blacklisting most frequent false positives.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Detecting Commas in Slovak Legal Texts

This paper reports on initial experiments with automatic comma recovery in legal texts. In deciding whether to insert a comma or not, we propose to use the value of the probability of a bigram of two words without a comma and a trigram of the words with the comma. The probability is determined by the language model trained on sentences with commas labeled as separate words. In the training data...

متن کامل

Semantic Annotation of Legal Texts through a FrameNet-Based Approach

In this work we illustrate a novel approach for solving an information extraction problem on legal texts. It is based on Natural Language Processing techniques and on the adoption of a formalization that allows coupling domain knowledge and syntactic information. The proposed approach is applied to extend an existing system to assist human annotators in handling normative modificatory provision...

متن کامل

Learning from Texts -a Terminological Metareasoning Perspective Learning from Texts -a Terminological Metareasoning Perspective Learning from Texts -a Terminological Metareasoning Perspective

We introduce a methodology for concept learning from texts that relies upon second-order reasoning about statements expressed in a ((rst-order) terminological representation language. This metareasoning approach allows for quality-based evaluation and selection of alternative concept hypotheses. Abstract We introduce a methodology for concept learning from texts that relies upon second-order re...

متن کامل

Learning from texts - a terminological metareasoning perspective

We introduce a methodology for concept learning from texts that relies upon second-order reasoning about statements expressed in a (first-order) terminological representation language. This metareasoning approach allows for quality-based evaluation and selection of alternative concept hypotheses. 1 I n t r o d u c t i o n In this paper, we consider the problem of concept learning from a new met...

متن کامل

Extraction and analysis of proper nouns in Slovak texts

Unknown named entity recognition in inflected languages faces several specific problems – the first and foremost is that the entities themselves are inflected1 (Dvonč et al., 1966) leading to a problem of identifying word forms as belonging to the same lexeme, and also the problem of finding correct lemma. In this article we analyse the distribution of word forms for proper nouns in Slovak and ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Rasprave: ?asopis Instituta za Hrvatski Jezik i Jezikoslovlje

سال: 2022

ISSN: ['1331-6745', '1849-0379']

DOI: https://doi.org/10.31724/rihjj.48.1.2